Intrinsic Geometry of Stochastic Gradient Descent Algorithms
نویسندگان
چکیده
We consider the intrinsic geometry of stochastic gradient descent (SG) algorithms. We show how to derive SG algorithms that fully respect an underlying geometry which can be induced by either prior knowledge in the form of a preferential structure or a generative model via the Fisher information metric. We show that using the geometrically motivated update and the “correct” loss function, the implicit and explicit discrete time updates are, under certain conditions, identical. This new loss function reduces to least square loss for linear regression with Gaussian measurement noise. We also show that the seemingly obvious requirement that the loss function is convex is not appropriate in non-flat geometries. We illustrate the power of the new framework by deriving an algorithm for a regression problem over a multinomial distribution.
منابع مشابه
Conjugate gradient neural network in prediction of clay behavior and parameters sensitivities
The use of artificial neural networks has increased in many areas of engineering. In particular, this method has been applied to many geotechnical engineering problems and demonstrated some degree of success. A review of the literature reveals that it has been used successfully in modeling soil behavior, site characterization, earth retaining structures, settlement of structures, slope stabilit...
متن کاملStability and Generalization of Learning Algorithms that Converge to Global Optima
We establish novel generalization bounds for learning algorithms that converge to global minima. We do so by deriving black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the minimizers of the loss function. The results are shown for nonconvex loss functions satisfying the Polyak-Łojasiewicz (PL) and the quadratic growth (QG) conditions...
متن کاملFastest Rates for Stochastic Mirror Descent Methods
Relative smoothness a notion introduced in [6] and recently rediscovered in [3, 18] generalizes the standard notion of smoothness typically used in the analysis of gradient type methods. In this work we are taking ideas from well studied field of stochastic convex optimization and using them in order to obtain faster algorithms for minimizing relatively smooth functions. We propose and analyze ...
متن کاملDecoupling the Data Geometry from the Parameter Geometry for Stochastic Gradients
Large-scale learning problems require algorithms that scale benignly with respect to the size of the dataset and the number of parameters to be trained; leading numerous practitioners to favor the classic stochastic gradient descent (SGD [1, 2, 3]) over more sophisticated methods. Besides its fast convergence, SGD has been observed to sometimes lead to signi cantly better generalization perform...
متن کاملAdaptive On - Line Learning Algorithms for Blind Separation | Maximum Entropy and Minimum Mutual Information
There are two major approaches for blind separation: Maximum Entropy (ME) and Minimum Mutual Information (MMI). Both can be implemented by the stochastic gradient descent method for obtaining the de-mixing matrix. The MI is the contrast function for blind separation while the entropy is not. To justify the ME, the relation between ME and MMI is rstly elucidated by calculating the rst derivative...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005